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RESEARCH OH THE MULTIPLE- CHOICE TEST ITEM IN JAPAH: 
TOWARD THE VALIDATION OF MATHEMATICAL MODELS 



ABSTRACT 

This monograph reports research, related to the multiple-choice test 
item, which is conducted by psychometricians and educational technologists 
in Japan. Sato's number of hypothetical equivalent alternatives is 
introduced. The author proposes a new index, k^, which can be used, among 
other things, for invalidating three-parameter models ^or the multiple- 
choice item* Shiba's research on the measurement of vocabulary, which is 
based upon latent trait theory, includes an eventual tailored test on 
vocabulary, utilizing information obtained from distractors as well as 
correct answers* With this research in mind, the author has developed 
basic ideas about a new family of models for tlie multiple-choice item* 
These are based upon both the information given by distractors, and the 
correct answer and th€ noise resulting from random guessing* 



PREFACE 



In the summer of 1979, I spent a few weeks in Tokyo under the 
sponsorship of the Office of NavaX Research (ONR) . This monograph is 
based on conferences with researchers ia Japan, in the areas of psycho** 
metrics, educational measurement, and educational technologies, and on 
research materials and technical literature collected during this trip> 
I thank Dr. Rudolph J. Marcus, Scientific Director, Miss Eunice Mohri, 
and other OWR/Tokyo staff members for providing me with office space 
and services, taking me to JICST, and helping me in many other ways. 

I was invited to one of the bimonthly meetings of the Educational 
Technology Group of the Institute of Electronics and Communication 
Engineers in Japan, which was held at the Central Research Laboratories 
of Nippon Electric Co*, Ltd*, on 23 July, 1979, and had an opportunity 
to talk with the researchers who came to the meeting from many different 
districts of Japan* The author is thankful to Dr* Takahiro Sato, the 
representative of the Group, and other members for their kind cooperation 
in collecting research materials and literature. 

It was also a pleasure to have several conferences with Dr* 
Sukeyori Shiba, Professor of Education at the University of Tokyo and an 
old friend of mine, during my stay in Tokyo, and to get to know a large 
scale research project on the measurement of vocabulary conducted by him 
and his students* The author is thankful to him and his students for 
making copies of their research materials and sending them to Knoxville, 
Tennessee, after I returned. 
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Because of the shortage of time, the author could not see all 
the people she had wanted to; among them are Professor Takeuchi of the 
University of Tokyo and Dr, Akaike of the Institute of Mathematical 
Statistics, who happened to be out of town during her stay in Tokyo^ 

The stimulation of these conversations, and of the research 
materials and literature obtained in Tokyo, started new trains of 
thought in the author's mind. Some of these concern the multiple- 
choice item, which is the subject of this monograph. Others require 
yet more work and further communication with Japanese colleagues. In 
particular, the author feels it is worth trying to reanalyze the vocab^ 
ulary test data collected by Shiba and others, using theory and methods 
which the author has developed and is going to develop. 

The author is thankful to the Office of Haval Research for this 
opportunity of visiting Tokyo, and hopes that the present report will 
contribute to the development of mental test theory and science in 
general. 

Fumiko Samejima 
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I Introduction 

There will not be any doubt In the mind of psychometrlclans 
that good mental test items are informative items, which make a 
great deal of contribution to the estimation of the examinee's 
ability, and, therefore, uncover the individual differences among 
the examinees accurately* In the history of mental test theory, 
the multiple-choice item arrived later than the free-^response 
item, out of the necessity of administering group tests and of 
scoring their results speedily and objectively, in the sense that 
there is no need for our subjective judgment and evaluation in 
scorings Today, an enormous number of multiple-choice tests 
are adsdnistered to youngsters, and their results have been used 
ia many important decision-making situations, such as guidance, 
selection, classification, and so on* To construct good multiple- 
choice test items and to develop good mental test theory which 
deals with the multiple-^choice item are, therefore, most important. 

Since the multiple-choice item was introduced as a substitute 
for the free-* response item. It has been treated by mental test 
theorists as something which is useful from the practical point of 
view, but not quite as good as the free-^response item- The three^ 
parameter logistic, or normal ogive, model, which is widely used 
by psychologists and educational psychologists for the multiple^ 
choice item today, is nothing but a "blurred" image of the logistic, 
or normal ogive, model for the f ree-^response item* In other words, 
there is nothing meaningful which is added to the original logistic, 
or normal ogive, model, but there are additional noises caused 
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by random guessing in the three-parameter logistic, or normal 
ogive model* 

We must stop and think, hovever, if the three- parameter 
logistic, or normal ogive, model really fits psychological reality, 
and if the multiple-choice test item cannot be more than a '^blurred** 
image of the free-response item* The author's answer to the first 
question is negativa, to the second positive* It is clear in the 
author* s mind that we need a better model than the three^parameter 
logistic, or normal ogive, model for the multiple-choice item, and 
That the multiple-choice item can provide us with a larger amount 
of information which results in a more accurate ability estimation, 
if we make use of the information given by its distractors, which 
the free-response item does not have* 

It was interesting to discover that, while very few researchers 
in the United States have ever questioned the appropriateness of th** 
three-parameter logistic, or normal ogive, model for the multiples- 
choice item, and have tried to validade it for their research data, 
the author*s perception is shared by some Japanese researchers* 
Some of these are members of a nation-wide research group called 
the Educational Technology Group of the Institute of Electronics 
and Communication Engineers in Japan* Most of the members of the 
group are engineers in com^^i^ter science, and some of them are 
educational psychologists* Tatsuoka has reported their names 
and research activities CTatsuoka, 1979), which are represented 
by such topics as the S-P table Cstudent-Problem table). 
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the number of hypothetical, equivalent alternatives*, interpretive 
structural modeling based on graph theory, and so forth. Some of 
their papers, which the author has had the opportunity of reading, 
are listed in Appendix III, Their standpoint concerning the multiple- 
choice item is based on infonnation theory (e,g, , Goldman, 1953), 
considering that an item is a good one if its eiipected uncertainty 
in the selection of an alternative is high* As the measure of 
the quality of an item, the number of hypothetical, equivalent 
alternatives (Sato, 1977) is used, which will be introduced in 
Chapter 2- One impressive feature of the activities of this group 
of researchers is that they do net use computers mechanically, 
as many other researchers do, but they give teachers the feedback 
information about the test items constantly, and then they obtain 
the teachers' feedback based on the content analysis of the items 
in question, and so on* Another group is Shiba and his students of 
the School of Education, University of Tokyo, They have spent the 

past several years for developing vocabulary tests, which are 
aimed at measuring vocabulary of subjects of a wide range of age, 
collecting data, constructing an integrated vocabulary scale 
(Shiba, 197S) , and then constructing a tailored test out of these 
vocabulary test items, using the information given by the distractorSf 
as wftll as the correct answers, for branching examinees (Shiba, 
Noguchi and Haebara, 1978), The theory and method used for analyzing 
their data are basically the same as those adopted in the research 
in which the author was involved (Indow and Samejima, 1962, 1966)' 

*Tatusoka translated th^ original word as the effective (or equivalent) 
number of options^ but the author uses this translation. 
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The outline of the work accomplished by Shiba and others will be 
given in Chapter 6- 

With the research conducted by these people as Incentives, 
the author has integrated her own ideas about mathematical models 
and the multiple- choice item. It resulted in proposing a method of 
validating, or invalidating, the three-^paranketer logistic, or normal 
ogive, model and the knowledge or random guessing principle, and 
eventually proposing a new family of models for the multiple-choice 
item, in which the information given by the distractors is fully 
utilized. 
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II Sato's Number of Hypothetical , Equivalent Alternatives 

Let g (=l,2,,,,,n) be a multiple-choice test item* In the 
present paper, however, this symbol g is omitted, whenever it is 
clear that we deal with only one item. Let i (=l,2,,**,m) be 
an alternative, or an option, of the multiple-choice item g , and 
p^ be the probability with which the examinee selects the alternative 
i • The entropy H is defined as the expectation of -log ^ p^ 
such that 



for the set of m alternatives of item g * It is obvious from 
(2,1) that the entropy H is non-negative, and, if one of the m 
alternatives is the sure event with unity as its probability, then 

H * 0 , Sato's number of hypothetical, equivalent alternatives 

k , is defined by 



and is used as an index of the effectiveness of the set of m 
alternatives for item g in the context of information theory. 
Since the entropy H indicates the expected uncertainty of the 
set of m events, or alternatives, the set of alternatives is more 
informative for a greater value of k . 

When the probability p^ is replaced by the frequency ratio, 
P , we can write for the estimate of the entropy such that 



m 



(2,1) 




(2,2) 




m 



(2.3) 



H - - E P log 
i=l ^ ^ 
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and for the estimate of k we have 



(2, A) ^t - 2^ 



We notice that we can obtain the number of hypothetical, 

equivalent alternatives k without using the entropy, for we have 

m 

H ^i^A ^*^^2^i m -p m p 

(2,5) k = 2^ - 2 ^ ^ " n - [ n p^I ^ , 

i=l ^ i*l ^ 

The quantity la the brackets of the last expression of (2,5) Is 
a kind of weighted geometric mean of p^ , Equation (2,5) also 
implies that we can use any base for log p^ , instead of 2 . 
For convenience, hereafter we shall use e as the base of log p^ , 
and use instead of H such that 

m 

(2-6) H* - - E p^ l^^Sef'i ^ ^ ' 
i=»l 

which equals zero when one of the alternatives is the sure event, and 

(2.7) k - e^* ^ 1 , 

and simply write log p_, instead of log p. , 

i ^ 1 

To find out the value of p. which maximizes H* , and hsnce 
k , we define Q such that 

m m 

(2.8) Q = - 2 P. log p. + At E p,-l] , 

i-1 ^ ^ 

where A is Lagrange's multiplier. Thus the partial derivative of 
Q with respect to p^ is given by 

(2-9) -gj- = -[log p^ + (l/p^)p^l + A -log p . + (A " 1) - 
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Setting this derivative equal to zero, we obtain 
(2-10) log • X - 1 , 

which is a constant regardless of the value of i * Since we have 
m 

(2,11) E p - 1 , 
i*l 

we obtain 

(2-12) ^ 1/m . 

Thus it is clear that t and hence k , is maximal when all the 
m alternatives are equally probable, and we can write 

(2.13) Tuax (H*) - log m 
and 

(2.14) max (k) = m - 

Since in the present situation the m events are alternatives^ 
the values of and k are affected by the difficulty level of 

item g " Let R be the correct answer to item g » which is given 
as one of its alternatives^ and p^ be the probability with which 
the examinee selects the correct answer R , Figure 2-1 presents 
the relationship between the probability p^ and the number of 
hypothetical, equivalent alternatives k . In this figure, the 
area marked by slanted lines indicates the set of k's which are 
less than max (k|p^) and greater than max[l/p^, min (k[p^)], and 
are considered to be reasonable values of k by Sato and others, 

ERIC 1, 




PROBABILITY FOR CORRECT ANSWER 
FIGURE 2-1 



Relationship between the Probability with Which the Correct 
Answer R js Selected and the Number of Hypothetical, 
Equivalent Alternatives, for Five-Choice items. 
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In practice. Figure 2-1 is used by replacing the probability 
Pj^ by the proportion correct, P^^ , and the number of hypothetical, 
equivalent alternatives, k , by its estimate k ^ It is well-known 
that the frequency ratio is both the least squares solution and the 
maximum likelihood estimator of the corresponding probability. 
It is interesting to note that, in addition, it is the estimator 
which minimizes the chi-square statistic. Let us define Q such 
that 

m Td 
(2.15) Q - Z [(NP^ - Np^)^/(Np^)] + A[ £ " 1] , 
i*l 1-1 

where N is the number of examinees and X is Lagrange's multiplier. 

Then we have 



(2,16) -l^- * N[(pJ - Pj)/pJ] + X - 0 , 



and 



(2.17) = [1 + (X/N)]"^^^ P. 



Since 

(2.18) 1 = S p. - [1 + (A/N)] i; P. = [1 + (X/N)] 

i-1 1=1 

we obtain 

(2.19) A = 0 , 

and from this and (2.17) we can write 



(2.20) Pi - • 

1 
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The translation, "the n\imber of hypothetical, equivalent 
alternatives," indicates the number of alternatives in the 
hypothetical situation where the entropy H is provided by the 
alternatives which are equivalent in the uncertainty of occurence* 
Although it is not the direct translation of the original word, 
it is used for k in the present paper, for it seems to the author 
to be the best describing word of the original* 
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III Information Given by Dlstractors in the Multiple-Choice Item 
and Random Guessing 

Sato's number of hypothetical, equivalent alternatives has 

been used mainly by the members of the Technical Group of Educational 

Technologists in Japan (cf, Tatsuoka, 1979) for the purpose of 

analyzing the effectiveness of alternatives relation with 

a relatively small group of examinees* The basic idea behind 

this index is that the expected uncertainty of the m events, or 

alternatives, be large, and, therefore, the number of hypothetical, 

equivalent alternatives be close to m • We notice that; 

(1) this concept is strongly population-^orlented, unlike those 
concepts in latent trait theory, 

(2) it is assumed that each examinee tries to answer the item 
seriously, without depending upon random guessing, 

and, 

(3) relative to the population of examinees, the existence of 
too attractive a distractor is not desirable, since it 
tends to reduce the value of k » 

Thus as long as this index is used for the analysis of test items 
which are given with careful guidance and supervision to samples 
of examinees from a well-^def ined population, and the findings of 
the analysis are not generalized across populations, it will serve 
Its purpose* 

If we generalize this concept and the resultant findings 
beyond these restrictions, however, we may be led to completely 

ERJC 2x 



-12- 



TKR III-2 



ERIC 



false conclusions^ To give an extreme example, suppose th^tt none 
of our examinees took the test seriously, and selected one of the 
alternatives at random, for each item of the test* In such a case, 
regardless of the difficulty level of the item, the number of 
hypothetical, equivalent alternatives, k , vill be very close to 
m for every iteml In spite of this superficial success, we have 
obtained no information about the individual examinees' ability 
levels as the result of testing. 

It is also noted that, if the examinee's behavior follows 
the knowledge or random guessing principle, i*e*, he vill ansver 
correctly if he knows the answer, or guess randomly otherwise, the 
value of k tends to be large. In this case, too, our success 
of obtaining a large k is only superficial and meaningless* 

In addition to the above facts, it is obvious that the value 
of the number of hypothetical, etjuivalent alternatives varies for 
different populations, i*e*, the same item may have a value of 
k which is very close to m for one population of examinees, 
and may have a very low value for another population* This may 
be due to the difference in the mean ability levels of the two 
populations, or to the different forms of two ability distributions, 
or both* Thus while the index may be useful for a fixed population 
of examinees and if we discuss "hew good an item is" in relation to 
that specific population, it cannot be considered as a parameter 
of the item per se* This limitation of the usefulness of k 
is of the same kind that is applicable for the reliability coefficient 
of the test, i*e*, in spite of taost psychologists' belief that 
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the reliability coefficient is one of the most important and solid 
properties of the test itself, it heavily depends upon the specific 
population of examinees for which the test is administered, and, 
therefore, is a dead concept since the population-free test information 
function is sufficient to serve the purpose (Samejima, 1977a). 

As a whole, there is no single answer to the question: "Are 
items which have high values of the number of hypothetical, equivalent 
alternatives good items?" even if we control the testing situation 
with respect to the purpose of testing, such as guidance, selection, 
etc. This is true even if we restrict the populations of examinees, 
and it is mainly because of the noise induced by random guessing. 
That is to say, in a general situation of testing, it is hard for 
us to determine whether we have accomplishec? the work by obtaining 
a high value of k , In fact, the largest possible value of k 
may imply no accomplishmant at all, as we have seen in one of the 
preceding paragraphs of the present chapter! 

In spite of the above limitations* however, the introduction 
of the number of hypothetical, equivalent alternatives and its use 
by Sato and other researchers of the Technical Group of Educational 
Technologists should be well credited, for their vision is 
oriented toward the full use of the information given by all the 
alternatives of the multiple- choice item- It seems that they 
are quite successful in using the index in the small group situation, 
such as school classes where instructions are well conveyed and 
random guessing is extremely discouraged. This orientation is in 
quite a contrast to the attitude of many researchers who are accustomed 
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to the blind use of the three-parameter logistic model for the 
multiple-choice item, without ever stopping to think if the model 
can be validated for their data. 
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IV Three-Parameter Models in Latent Traic Theory and the Role 
of Item Dlstractors 

Let 6 be ability, or latent trait* that we intend to measure 

with our. test* The three-parameter logistic modelj or normal ogive 

modelj is based upon the knowledge or random guessing principle, i-e., 

the examinee either knows the answer or guesses randomly* Let ^ (6) 

be the item characteristic function of Item g > which is the 

conditional probability with which the examinee answers item g 

correctly* given 9 > in the free-response situation* This is given 

by 



'a (e-b ) z, 
(4.1) 4'g(e) - (2ir) I 6 s g-u 



in the normal ogive model, and 

(4.2) f (6) = [1 + exp{-Da (S-b )}]~^ 

in the logistic inodel» where a is the item discrimination parameter 
and b is the item difficulty parameter (Lord and Wovickj 1963* 
Chapter 16), and D in (4.2) is the ^icaling factor which assumes 
1.7 (Bimbaum^ 1963) when the logistic model is used as a substitute 
for the normal ogive model. 

The item characteristic function* ? (6) * for the multiple- 
choice item in the three-parameter normal ogive* or logistic* model 
is defined by 

(4.3) p^(e) = ¥^(6) + [i-^gCe)]Cg - + U-Cgi^^ce) * 

where ^ (6) is given by (4-1) or (4.2) and c ig a constant which 
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is called the guessing parameter^ and equals ^^^g ' * 

It should be noted that, following these models, there is 
no information given by the alternatives other than the correct 
answer, for all the responses to the wrong answers are the result 
of random guessing^ Should one of these models be valid for the 
item in question, the multiple-choice item would be nothing but 
a poor image of the binary, free-response item, which is contaminated 
by the noise caused by random guessing. 

Let i be an individual examinee, and u^ be the binary 
item score for the multiple-^choice item g * The conditional 
expectation and variance of the binary item score u , given 6 , 
can be written as 

(4.4) E(u|e) - P (6) - c + (l-c)¥ (6) - (l/m)[l + (m^l)H^ (6) ] , 
where c is the simplification of c^ , and 

(4.5) Var.(uje) ^ [ (m-1) /m^] [l^T (6) ] [l+(ii^l)r (6) J - 

g S 

Let u^^ bft the binary alternative score for the alternative i 
obtained by the individual j * for the multiple-choice item g * 
Thus can write 

(4.6) - • 

The conditional expectation and variance of the binary alternative 
score u^ (il*R) , given 6 , are given by 

(4.7) E(u^|e) - c[l-¥ (e)] = (l/m)[l-f (6)] 
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and 

(4.8) Var.(u Je) = [1-1' (6) ] [ (m-l)-t¥ (6)] . 

^ S 8 

Let X be either u or , or any other discrete random variable, 
and p(X) and p(X|6) denote the marginal and condition<il probability 
ftinctions of X , respectively. Then the relationships among the 
conditional and tinconditional expectations and variances are given 
by 

(4.9) E(X) = z Xp(X) = zx p(X|e)f(e)de = z x p(x|e)f(e)de 

" fZ E(X|e)f(e)de - E[E(Xle)] 

and T 

(4.10) Var.(X) = E [X-E(X) ] ^p (X) = S [X-E(X) |*_^ p(X 1 6) f (e)de 

= fZ s[x-E(x|e)]^p(X|e)f(e)de 

+ [Z [E(x|e)-E{X)]2i;p(x|9)f(e)de 

- E[Var.(Xle)] + EtE(X | e)-E(X) ]2 . 
In particular, we can write 

(4.11) E(u) = E[E(uje)] - j^Z I'g(9)f(e)de = 
and 

(4.12) Var.(u) - E[Var.(u!e)l + E[E(u| e)-E(u) ] ^ 

= j^Z pg(e)[i-pg(e)if(e)de + /"j^ [pg(e)-Pj^]^f(e)de 

for the binary item score u , and, for the alternative score , 
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(4.13) e(u^) = EtE(u^|6)] = (l/m) [1-Vg(6) ]f (6) d6 



[l/(m-l)] I'j; tl-PgCe)]f(6)d6 = Il/(m-l)]Cl-p^) 



and 
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(4,14) Var, Cu^) 



E[Var.(u^l6)] + E [ECu^ | 6)-ECu^) ] ^ 
fl^°'^> fZ [l-*l'g(6)][Cm-l)+Vg(6)lfC6)d6 



- (1/ 



- 2p^(l/m) 



(e)]f(6)d6 



[1-4^^(6) ]f(6)d6 + 



We notice that E(u) given in (^-11) is the item difficulty parameter 
in classical test theory, which depends upon the specific population 
of examinees as \jrell as the test item* 

It should be noted that both the expectation and the variance 
of u^ for i^R , which are given by (4-13) and (4<14), respectively, 
are equal for all the wrong answers, and are determined, solely, by 
p^ and the number of the alternatives, m , This is the logical 
consequence of the fact that the responses to those wrong answers 
are completely the result of random ijuessing, and provide us with 
no information about the examinees' ability levels. 

We must remember, however, that most of the conscientious 
test constructors try to avoid the contamination of the quality of 
items, by finding incorrect , but plausible , answers and including 
them as dis tractors in the set of alternatives. This indicates 

28 
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that the responses to these alternatives are not the result of random 
guessing, and may contain useful information about the examinee's 
ability level. The adoption of one of the three-parameter models 
for such multiple-choice items is not justifiable, since in so doing 
the researchers distort psychological reality and will produce 
nothing but meaaiitgless artifacts a^ the result of their research. 

It is strange to the author that many researchers have ignored 
the contradiction which was described in the preceding paragraphs, 
and have applied the three-parameter models to their data for years, 
which, obviously, are based on the tests containing many distractors. 
As far as they continue repeating this mistake, their conscientiousness 
as researchers has to be questioned. 
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V Index k* for Invalidating Three-Parameter Models 

It has been pointed out in Chapter 3 that Sato's number of 
hypothetical, equivalent alternatives takes on a high value, if 
ev^ry e^^aminee in the group has selected one of the m alternatives 
at random. This fact iBiplies that, although the index was introduced 
for quite an opposite purpose, it may also be useful in detecting 
th^ examinee ' s random guessing behavior in the multiple-choice 
item. 

To materialize the above, we need the following consideration. 
When the examinee follo\fs the knowledge or random guessing principle 
and the item characteristic function assumes the three-parameter 
logistic, or normal ogive, model, the index k is solely affected 
by the probability with which the examinee knows the answer, as is 
obvious from Figure 2-1 and (4,3) and (4-11), This fact provides 
some inconvenience, however, for the probability of knowing the 
answer heavily depends upon the specific population of examinees, in 
addition to the item characteristic function of the item in the 
^ree-response situation. Xt will be more convenient, therefore, 
if we can modify Sato^s index k in such ^ way that It is unaffected 
by the ability distribution of a specific population of examinees, 
and can be considered as a pure property of the item. With this 
aim in mind, we shall introduce a new index in this chapter. 

Let A be the event that the examinee does not know the 
answer to item g , and consider the probability space which 
consists of such a subpopulation of examinees. The conditional 
probability, p(ijA) , with which the examinee selects the alternative 



-21' 



TKR V-2 



i of item g in this conditional probability space is given by 



1^- p [ S p + p*]"^ i^R 
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(5.1) p(i|A)^ 

- Pit ^ P, + P*]"^ , i-R 

where p* denotes the probability with which the examinee guesses 
correctly for item g * The new index, k* , Is defined in terms 
of these conditional probabilities, in such a way that 

(5.2) k* - exp[' ? p(ilA)-log p(iiA)] - [ H p(i| A)^^^^^^]'^ . 

i-1 i=l 

It Is obvious that p(ijA) for i?*R Is proportional to p^ * for 
every examinee in the population who has selected one of the wrong 
answers does not know the answer, and, consequently, he is also 
in the subpopulation A . On the other hand, examinees who have 
selected the correct answer R are not necessarily in the 
subpopulation A , so we can write 

(5-3) P| ^ Pr ' 

Note that, if the examinee's behavior follows the knowledge or random 

guessing principle and the item characteristic function of the 

multiple-choice item g is of one of the three-parameter models, 

p* equals p for i?*R , and, as the result, all the m p(i[A)'s 
K i 

are equal and k^ m - 

In practice, we need to use some estimates for p(i|A)*s , 
to obtain the estd'iute of k* - Since we have the frequency ratio, 
, for the estimate of p^ for if*It , all we need to do is to 

3i 
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find Out an appropriate estimate of p* * Let P* denote such 

R R 

an estimate of p* , and P* be such that 



C5-A) ^ ; 

i=R 



Then we can write for the estimate of p(ijA) such that 

m 

(5.5) p(i|A) « P*[ Z P*]"-'- . 

i«l 

We are to take the strategy of finding P* ^rtiich makes k* maximal. 
Define H* such that 

m 

(5*6) » log k* - - Z p(i|A)'log p(i|A) 

i'l 

m . m m m 

- -t Z P*]" t Z P?*log P* - ( Z P*)*log { I P*}] . 

S»l ® i-l ^ ^ isl ^ s-1 ^ 

Then the partial derivative of H* with respect to P* can be 
written as 



3H* " _2 " " 
^^•^^ 3P^ ' t Z P*J t 2 PJ'log P* - ( Z P|)*log P*] , 
R s=l i»l s=l 



and, setting this equal to zero, we obtain 

(5.8) log PS - [ Z P^]~^ t P 'log P. 
and then 

p^t z P 

(5.9) P* - n P, ^ s?tR 

Thus we can use (5.9) In (5.4), andj therrefore, obtain p(i|A) 
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through (3,5), The estimate of the new index, » is given by 

(3,10) ^ expE- E p(i|A)-log ^UlA)] = [ Jl ^(i | A)^'^^1^M - 

i-1 i=l 

A necessary^ though not sufficient, condition for one of the three-* 
parameter models to be valid is that should be equal to m 

within sampling fluctuations, regardless of th_e_ population of 
examinees from which our sample happened to be selected. If this is 
not the case, we must say that the three-parameter model does not 
fit our item, i,e-, the invalidation of the model. 

Although the invalidation of the three-^parameter logistic, 
or normal ogive, model is easy, its validation Is more difficult. 
We recall that Sato's number of hypothetical, equivalent alternatives 
is used as a measure of the desirability of the item for a specific 
population of examinees. If all the distractors are equally probable 
for a specific population, then the index will also equal m , 

in spite of the fact that the two cases are completely different 
in nature. This problem can be solved by administering the same 
test to a different group of examinees, which has a different 
ability distribution from that of the first gtoup. If the large 
value of k* is due to the knowledge or random guessing principle, 
then it will also be large for the second group of examinees because 
of its population-free nature. On the other hand, if the large 
value of k* is resulted from the optimal quality of the item for 
the first group of examinees, then it will not be as large as that 
for the second group » unless the operating characteristics of all 
the distractors are identical. 
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It should be emphasized that takes on a large value even 

if the knowledge or random guessing principle does not work behind 
the examinee's behavior, but the item is "suitable'* for the group 
of examinees to which the test has been administered, in the same 
sense that a high value of Sato's number of hypothetical, equivalent 
alternatives is meant to indicate. This fact means that, when we 
need to use only one set of data for validating, or invalidating, 
the knowledge or random guessing principle and the three-parameter 
logistic, or normal ogive, model, we must use, at least, one more 
necessary condition for the principle to be valid. One such 
necessary condition is that the sample means of ability 6 , or 
of its estimate, of the subgroups of examinees who have selected 
the wrong answers should be equal, within the range of sampling 
fluctuations. Thus, if either the value of k* is substantially 
less than m , or the sample means of ability 6 of such subgroups 
of examinees are not close to each other, then we shall be able 
to say that the knowledge or random guessing principle and the 
three-parameter model are invalidated. On the other hand, if both 
of the necessary conditions are satisfied with our data, we can say 
there is no reason to reject the principle and the loodel. 

For the purpose of illustration, a set of simulated data was 
calibrated, using the Monte Carlo method. In this set of data, 
five hypothetical multiple-choice test items were assumed, 3ach 
having five alternatives. A, B, C, D and E, with A always as the 
correct answer. Each item Is assumed to follow the three-parametet 
normal ogive model, which Is given by (4,1) and (4,3), with the 
parameter values shown in Table 5-1* A group of five hundred 

3'i 
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TABLE 5-1 

Itea Discrimination Parameter 

Item Difficulty Parameter of Each 

of the Five Hypothetical, Binary Items 
Following the Three-Parameter Normal 
Ogive Model, with c « 0.2 . 



Item 


a 


h 




g 


g 


1 


1.00 


0.00 


2 


1.50 


0.00 


3 


2.00 


0.00 


4 


2.50 


0.00 


5 


3.50 


0.00 
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hypothetical examinees was assumed, whose ability levels are placed 
at one hundred equally spaced points on the ability continuum, 
which start with -2*475 and end with 2*475, in such a way that 
subjects 1 through 5 are placed at 6 « -2.475 , subjects 6 through 
10 are at 6 « -2.425 * and so on. For each of the five hypothetical 
multiple-choice items, the response of each of the five hundred 
hypothetical examinees was calibrated according to the specified 
item characteristic function and the knowledge or random guessing 
principle* These calibrated responses are presented as Table A-1 
in Appendix I* 

Table 5-2 presents the frequency ratio > , of each of 

the five alternatives, for each of the five hypothetical multiple- 
choice items* We can see that sampling fluctuations are fairly 
large for item 4, and to a less degree for item 2, since the 
corresponding probability, p^ , is 0.6 for the alternative A and 
0.1 for each of the alternatives B, C, D and £* In the same table, 
also presented are the values of * which were obtained through 
(5.9)* Using these values in (5.6), (5.9) and (5.10), the estimates 
of the entropy H* and the index It* were obtained, and are 
presented in Table 5-3. Since the maximal possible value of H* 
is approximately 1*60944 ("log m) and that of £* is 5 (^) , we 
can say that these results are sufficiently close to their respective 
maxintal values, i*e., an exemplification of the satisfaction of one 
of the necessary conditions for validating the three-parameter 
normal ogive model and the knowledge or random guessing principle 
by our simulated data* The fact that these results are less 
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TABLE 5-2 
Frequency Ratio of the Subject, P 



Who Selected 



Each of the Five Alternatives, and the Modified 
Frequency Ratio P* for the Correct Answer A, 

K 

for Each of the Five Hypothetical Items- 



Alternative 


B 


C 


D 


E 


Item 












1 


PI 


.608 
.098 


.086 


.106 


.100 


.100 


2 


\ 
P|[ 


.618 
.096 


.102 


.080 


.106 


.094 


3 


\ 
PI 


.600 
.100 


.094 


.106 


.108 


,092 


4 


Pi 


.606 
.101 


.104 


.078 


.130 


.082 


5 


\ 


.598 
,101 


,092 


.100 


,104 


,106 
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TABLE 5-3 

Entropy, H*, and the Number of Hypothetical, 
Equivalent Alternatives, fc* , for Each of 
the Five Hypothetical Items Following the 
Three-Paraaneter Hormal Ogive Model- 



Item 


H* 




1 


1.607U 


4.98853 


2 


1.60501 


4.97789 


3 


1.60744 


4.99000 


4 


1.59224 


4.91475 


5 


1.60829 


4.99424 



36 
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satisfactory for item 4 and the same is true, to a lesser degree, 
for item 2 must be due to the sampling f luctuacions, which were 
observed in Table 5-2. 

As another necessary condition for validating the three- 
parameter normal ogive model and the knowledge or random guessing 
principle, the mean of 9 for each of the five subgroups of 
examinees, who selected different alternatives, was computed, for 
each of the five multiple-choice items. Table 5-4 presents the 
result of these means of 9 . In the same table, also presented 
is the expectation of 0 for each of the five subgroups, using 
the uniform ability distribution for the interval, (-2.5, 2.5], 
for each item, following the three-parameter normal ogive model 
and the knowledge or random guessing principle. Since all the 
responses to one of the four wrong answers of each item are nothing 
but the result of random guessing, these alternatives are equivalent, 
and have the same mean value of 8 . We can see that, for each 
item, the mean of 0 for the correct answer and that of each 
incorrect answer are substantially different, and they are close 
enough to the respective theoretical means. 

In practice,' there is no way to observe the examinee's 6 
itself. We can use its maximum likelihood estimate, § , however, 
and use it as the substitute in the above process, for example* 
We must obtain a similar result as above, to validate the three- 
parameter laodels and the knowledge or random guessing principle. 



We notice that a similar result as the one in our example 
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TABLE 5-^ 

Sample Mean of 6 for the Subgroup of Hypothetical Examinees Who 
Selected Each of the Five Alternatives, and Its Corresponding 
Theoretical Mean, for Each of the Five Multiple-Choice Items* 



.^temative 


A 

(Correct) 


B C D E 
(Incorrect) 




E(e) 


e 


e 


E(9) 


1 


0.703 


0.619 


-0.912 -1.017 -0.994 -0.905 


-1.054 


2 


0.774 


0.752 


-1.341 -1.084 -1.249 -1.161 


-1.161 


3 


0.800 


0.811 


-1.165 -1.233 -1.224 -1.237 


-1.200 


4 


0.812 


0.809 


-1.230 -1.119 -1.253 -1.369 


-1.218 


5 


0.822 


0.809 


-1.061 -1.193 -1.260 -1.282 


-1.234 



40 
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can be obtained, if, iacideatally, all the distractors require 
"on the average" approximately the same level of ability for the 
examinee to be attracted to them, for our group of examinees. 
This fact indicates that it is desirable to add more necessary 
conditions to examine, such as the approximate equality of the 
second moment of 9 , or § , that of the third moment, etc., 
for the subgroups of examinees who have selected the wrong answers. 
Since these subgroups of examinees are "equivalent" in ability 
distribution if the knowledge or random guessing principle and 
the three-parameter model are valid, these higher moments should 
be equal within sampling fluctuations, which it is highly unlikely 
that all the subgroups of examinees who have been attracted to 
separate dlstractors are equivalent in ability distribution. We 
must avoid, however, using moments of too high degrees, for their 
sampling fluctuations tend to be enormously great. 



4x 
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VI Shiba's Res exarch on th_e Measurement of Vocabulary 

In this chapter, we shall introduce a research on the 
measurement of vocabulary^ which was conducted by Shiba and others- 
The author found it interesting, especially in the following aspects. 

CD The vocabulary tests they used are very well constructed, 
choosing each alternative carefully. 

(2) Subjects were selected from many different age groups* 

(3) Unlike many researchers in the United States* they have 
tried to make a full use of the distractors* 

The battery of tests used for the construction of the 
vocabulary scale consists of eleven tests, Al, A2, A3, M, A5, A6, 
Jl, J2, SI, S2 and U - Each test contains thirty to fifty-eight 
multiple-choice items, each having a set of five alternatives- 
These tests differ in difficulty, and each of them is designed for a 
different group of ages* ranging from six years of age to the ages of 
college students. There are subsets of items included in two tests, 
which are adjacent to each other in difficulty. For example, 
items 37 through 56 of Test Jl are also items 1 through 20 of Test 
32, The number of examinees used for the vocabulary scale 
construction varies between 412 sixth graders of elementary schools 
for Test A5 and 924 second graders of senior high schools for Test 
SI. Ccf- Shiba, 1978,) 

The model adopted for the item characteristic function of 
each vocabulary item is the logistic model, such that 

(6.1) P^C6) [1 + exp{-Da^(6'b J}]^'^ , 
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where a and b are the item discrimination^^d difficulty 
parameters, respectively, and D » 1.7 . Note that Shiba did not 
use the three-parameter logistic model, which is characterized by 
(4*2) and (4,3)* This is based on his belief that three-parameter 
models are not applicable for well-developed multiple-choice items, 
which he has formed through his many experiences in test construction 
and research. 

Each of the eleven tests was administered to a group of subjects 
who belong to a single school year, except for college students. 
Hereafter, for convenience, we shall use EL for elementary schools, 
JH for junior high schools, SR for senior high schools, and CS for 
colleges, and add the school year after each symbol- For instance, 
by SH2 we mean a group of subjects who are in the second year of 
senior high schools. The correspondence of the subject groups and 
the tests administered is summarized as follows: 

Al for ELI (650), A2 for EL2 (650), A3 for EL3 (546), 

A4 for EL4 (617), A5 for EL5 (599), A6 for EL6 (412), 

Jl for JHl (614), J2 for JH2 (758), SI for SHI (924), 

S2 for SH2 (759) and U for CS (740) , 

where the numbers in parentheses indicate respective numbers of 
examinees. Note that JH3 snd SH3 are not included in the data 
which are the basis of the vocabulary scale construction- 

The main steps for analyzing these data are the following* 

[A] For each of the eleven groups of examinees, the ability 
distribution is assumed to be the standard normal distribution. 

[B] Assuming the nonnal ogive model, such that 
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(6.2) 



P,(6) - (27T) 




du 



where a and b are the item discrimination and difficulty 
6 S 

parameters, respectively, and the local independence of the 
item variables (Lord and Movick, 1963, Chapter 16), and also 
that the regression of each item variable on ability 9 is 
linear, the tetrachoric correlation coefficient is computed 
for each and every pair of items, 

[C] The principal factor solution of factor analysis is Applied 
for the correlation matrix thus obtained, using the largest 
absolute value of the correlation coefficient in eaiih row, 

or column, as the communality. This step is also the process 
of validating the uni-dimensionality of ability 9 , Figure 
6-1 illustrates the resulting set of eigenvalues for Test Jl 
which was administered to 614 first year junior high school 
students. It turned out that the first eigenvalue is much 
larger than all the other eigenvalues, and thus the uni- 
dimensionality was confirmed. Hereafter, this first principal 
factor is treated as 6 , 

[D] From the result of factor analysis, the item parameters are 

obtained. Let p be the factor loading (e,g,, Lawley and 
S 

Maxwell, 1971) of the first principal factor, or 6 , for item 
g . The item discrimination parameter, a , is obtained by 



(6.3) 




-1/2 
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FIGURE 6-1 

Eigenvalues of the Correlation Matrix of the Fifty-Five Items 
of Test Jl, Ordered with Respect to Their Magnitudes. 
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Let *(u) denote the standard narmal distribution function, 
such that 



(6.4) *(u) - (27T)"^^^ J"" e ^^'^ dt . 



The item difficulty parameter, b , is given by 
(6.5) bg - *'\l-Pg^) Pg-^ . 

where p „ is the probability with which the examinee answers 

item g correctly* In practice, this is replaced by the 

frequency ratio, , to provide us with the estimate of 

b * 
S 

[E] The eleven ability scales thus constructed are considered to 

be on the same continuum, and they are integrated into a single 

scale. This equating is made through the ten subsets of items, 

each of which is shared by two adjacent tests. Let a^ and 

b be the item parameters estimated from the result of the 
S 

first test, and a* and b* be those from the result of the 
g g 

second test* Denoting the two ability scales by 9 and 9* , 
respectively, we can write 

(6-6) ^g(e-b ) « a*(e*-b*) , 

since the item characteristic functions, which follow the 
normal ogive model, of the same item g on the two ability 
scales must assume the same value for the corresponding values 
of 9 and 9* . Thus the functional relationship between 

er|c 
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6 and 6* is given by 

(6.7) 6* - (a /a*)6 + [b^-Ca /a*)b ] , 

g g S g g g 

which is linear, and the two coefficients are obtained from 
these four parameters* In practice, we obtain as many sets 
of coefficients as the number of common items, and we need to 
^lse some type of "average" of these coefficients for the scale 
transformation. Figure 6-2 presents the ability distributions 
of the eleven subject groups after such transformations were 
made and the mean and the standard deviation of the distribution 
of Jl are taken as the origin and the unit for the new, 
integrated ability dimension. 

[F] The item characteristic function of each item on the new, 
integrated scale 3 is approximated by the logistic function, 
which is given by (6-1)- 

[G] The maximum likelihood estimate, 6^ , of each examinee's 
ability is obtained through the equation 

" ^ ^ 

(6.8) £ a ? (6.) =- 2 a u , 

g-1 « « ' g=l ^ 
(cf. Birnbaum, 1968)^ where u^^ is the binary item score of 
individual j for item g * 

[H] The test information function of each test is obtained by 

n 

(6-9) 1(6) = I I (6) , 
g-1 * 

Where I (6) is the item information function of item g such 
g 



4/ 




FIGURE 6-2 

Estimated Density Functions of the Twelve Groups of Examinees, Which Are Assumed to Be Normal. 
The Ability Scale Is Defined in Such a Way that the Density Function of the 
First Grade Group of Junior High School (JHl) Is n(0,l) , 

48 ^ 

< 
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that 

C6-10) ig(6) - [p^(e)]^iPg(e){i-P^(e)}]"^ , 

Figure 6-3 presents the test information functions thus 
obtained for the eleven tests* 

[I] The theoretical frequency distribution of test score T for 
each test and examinee group can be written as 
u 1-u 

(6-11) Hi I ? (B) ^[1-p (e)] ® , 

8 

where V is a response pattern or a vector of n item scores, 
and T is the test score given by 
n 

(6.12) T = £ u , 

This is used for the validation of the model and assumptions 
adopted in the process of analysis* Figure 6~A illustrates 
the goodness of fit of this theoretical frequency distribution 
of test score to the actual frequency distribution, for Test 
Jl, 

[J] The sample mean of the maximum likelihood estimate § of the 
subgroup of examinees, who selected each of the five alternatives 
is calculated, for each item of each test* 

[K] A tailored test of the vocabulary is constructed by selecting an 
appropriate subset of items from these eleven tests, in such 
a way that an individual is directed to a next item which is 
chosen on the basis of the sample mean of 9 of the alternative 
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FIGURE 6-4 

Theoretical and Observed (shaded) Frequency Distributions of the Test Score 

of Test Jl . 
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he has selected for the present item. 

We have seen in the preceding paragraphs a brief sketch of 
Shiba and others' work> It is tonfortunate that the author cannot 
convey the fine quality of the tests themselves to the reader, for 
they are vocabulary tests and their translation from Japanese into 
English would certainly destroy the nature ot the tests. We can 
see that the research has been conducted very conscientiously, 
however, including several processes of validation, and has eventually 
produced a widely applicable vocabulary scale and a tailored test. 
In the latter result, although there is some room for improvement, 
the use of distractors for ''branching*' subjects should be taken 
as a stimulation to the researchers who are engaged in this area, 
for it has seldom been seriously investigated by other researchers. 

The research conducted by Shiba and others includes inore 
interesting data than were used in the vocabulary scale construction* 
Table 6*1 presents a part of them, in which the frequency 
distribution of the alternative selection and the mean of the 
maximum likelihood estimate of ability for each alternative are 
shown for nineteen items included in both Tests ^^1 and J2, and 
administered to four different subject groups, JHl, JH2(a), JH2(b) 
and JH3< In the same table, also presented is the discrepancy 
between the mean of § for the correct answer and the lowest 
mean § for one of the four wrong answers, under the heading, 
"largest discrepancy." The correct answers ^^e always identified 
as the ones which have the highest means of 6 , except for the one 
for item 3 administered to JH2(b), which is the second highest 
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TABLE 6-1 

Mmb Of thm HKXlttua Likelihood Escluce* o£ Ability, $ ^ for Each o£ the 
flv« Subgroup* of Subjoccs S«I«ccitig Dlfforcac Alcomaclvos, for Z:ach of 
cbft 19 V0c«bul«r7 T««c Icoas, Tog«ch«r vlcb ctw Actual Fcequency Dl«crl- 
boclons CfIiQ). Th« Dlff«r«iic« b«cvMia chs H«*n S of ch« Corrttcc Sub- 
firoups lUkd tliA LcnMC Htta. S la Al^o Pr«s«iired Aa LArg«sc Discrepancy 
for tmch Irtt^ To«c Jl, Junior Bigh School Gra<te 1 





Indle«« 




Alramarlve 

2 3 


4 


5 


Total 


Larg^c 
Disc rapancy 


37 


Hm e 


0.4O1 
287 


^.476 
50 


-0.482 
59 


-O.750 

J? 


-0.148 
117 


572 


1.151 


3a 


Flo 
















39 


PBO 


^.192 
91 


-0.091 
115 


-0.270 
118 


-0.243 
51 


0.400 
187 


562 


0.670 


40 




0.071 
60 


-0.416 
141 


-0.336 
90 


0.310 
273 


-0.479 
9 


573 


0.789 


41 


Hm $ 


-0.557 


-1.0O7 
20 


«^.445 
23 


-0.456 
85 


0.254 


573 


1.261 


42 


Mud 4 

FSO 


0.339 
247 


-0.570 
21 


0.036 
121 


-0.439 
84 


-0.387 
97 


570 


0.,909 


43 


IteaQ S 

**** 


-0. 512 
26 


0*376 
^ 308 


-0.572 
98 


-0.245 
67 


-0.393 
73 


572 


0.948 


44 


lte«a $ 
PHQ 


«^.293 
119 


*0.547 
67 


-0.595 
14 


0.271 
333 


-0.318 
36 


569 


0.866 


45 


lte«a S 
PHQ 


^.638 


-0.412 


-0.636 


0.395 
346 


-0.593 
23 


568 


1.033 


4S 


HW 3 

pnq 


0.444 
296 


*0.741 
46 


-0.325 
44 


-0.428 
164 


-0.534 
18 


568 


1.185 


47 


Fiq 


«^.26l 
69 


0.270 
224 


-0.078 
158 


-0.426 
53 


-0.101 
65 


569 


0.696 


4& 


Fiq 


-0.129 
81 


-0.024 
100 


-1.013 
58 


-0.467 
67 


0.412 
258 


564 


1.425 


49 




*0.339 
115 


0^.390 
31 


-0.284 
42 


-0.464 
70 


0.309 
315 


573 


0.773 


SO 




0.349 
308 


-0.256 
46 


-1.015 
35 


-0.317 
36 


-0.385 
96 


571 


1.364 


51 




*0.137 
89 


-0.640 
82 


-0.077 
75 


-0.136 
113 


0.429 
201 


560 


1.069 


52 




-0.219 
116 


0.291 
235 


-0.110 
80 


-0.608 
34 


-0.095 
100 


565 


0.899 


53 


ite«a e 

ntQ 


-O.071 
163 


-0.O3O 
51 


-0,453 
34 


0.527 
143 


-0.241 
181 


572 


0.980 


54 


lte«a S 

ntQ 


0*132 
182 


-0.060 
111 


-0.084 
100 


-0.037 
142 


-0.283 
26 


561 


0.415 


55 


IteaQ 4 

Fiq 


0.U4 
27 


-0.278 
72 


-0.172 
317 


-0.533 
29 


0.690 
126 


571 


1.223 


56 


PHQ 


-O.460 
104 


-0.113 
101 


-0.412 
115 


0.742 
141 


0.015 
lU 


572 


1.202 
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TABLE 6-L (l ^iauad) : Teat Jl, Junior High Sch<J<Jl Crade 2 



Item 


Indice* 


L 


Alternative 
2 3 


4 


■ 

5 


Total 


Latgeat 
Disctepancy 


37 


Mean § 


0.886 
269 


-0.215 
39 


-0.249 
39 


-0.312 

37 


0.023 
71 


455 


1.193 


38 


FRQ 
















39 


Mean $ 
FRQ 


0.334 
55 


0.136 
97 


0.083 
82 


-0.063 

50 


1.015 
166 


450 


1.083 


40 


tfftan 8 

FlUJ 


0.521 
61 


-0.133 
95 


0.109 

45 


0.302 
243 


-0.236 
14 


453 


1.088 


41 




-0.553 

27 


-0.440 
13 


-0.173 
19 


-0.019 
47 


0.665 
355 


461 


1.213 


42 


M«an e 
FRQ 


0.3LO 
257 


-0.4Z6 
14 


0.348 

DO 


-0.089 
67 


-0.201 
51 


457 


1.236 


43 


Mean 9 
FRQ 


-0.162 
10 


0.791 
312 


-0.573 
53 


0.142 
46 


-0.321 
37 


458 


1,369 


44 


Mean 8 
FHQ 


0.293 
65 


-0.145 


-0.228 
15 


0.664 
291 


0.237 
31 


456 


0, 892 


45 


Kean ^ 
FHQ 


-0.124 

30 


0.139 

23 


-0.290 
79 


0.323 
299 


-0,469 
28 


459 


1.292 


46 


Kean 8 


0.849 

308 


-0.751 
25 


-0.263 
29 


-0.260 
90 


-0.072 
7 


459 


1.600 


47 


tiean § 

FlUJ 


-0.136 

43 


0.764 

302 


-0.119 
54 


-0.194 
30 


-0.001 

30 


459 


0.958 


48 


Mean 6 
FHQ 


0.483 
56 


0.262 
85 


-0.889 
38 


-0.036 
45 


0.871 
231 


455 


1.760 


49 


tiftan § 
FRQ 


o.oso 

96 


-0.351 
16 


0.183 
19 


-0.419 

35 


0.756 
294 


460 


1.175 


50 


FHQ 


0.798 
269 


0.153 
19 


-0.634 
20 


0.151 
84 


-0.099 
63 


455 


1.432 


51 


tleaix a 
FHQ 


o.iia 

76 


-0.260 
47 


0.312 
55 


0.150 
68 


0.909 

202 


443 


1.169 


52 


Mean § 
FRQ 


0.195 
60 


0.778 
239 


0.035 
71 


0.206 
21 


0.177 
53 


449 


0.743 


53 


Mean § 

FHQ 


0.376 
94 


0.L93 
34 


-0.013 
26 


0.918 
180 


0.040 
125 


459 


0.931 


54 


Mean 6 ' 
FRQ 


0.817 
177 


0.256 
75 


0.282 
32 


0.221 
108 


0.051 
9 


451 


0.7&6 


55 


FHQ 


-^.043 
20 


-0.042 

45 


-0.052 
174 


-0.455 
18 


1.157 
201 


458 


1.612 


56 


Mean § 
FHQ 


0.256 

70 


0.236 
100 


-0.289 
80 


1.354 
128 


0.247 
77 


455 


1.643 
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TABLE i-l (Conclnued) : Test J2, JudIoc Hlgli School Gcade 2 



Item 




1 


Alternative 
2 3 


4 


5 


Total 


Largest 
Di*ccepancy 


1 


FRQ 


-0.247 
145 


-0.901 
11 


-1.14S 
19 


-1.354 
11 


-0.744 
35 


221 


1.107 


1 


Mean 8 
FRQ 
















3 


Mean e 
FRQ 


-0.667 
2S 


-0*660 
45 


-0.639 
42 


-0*834 
16 


-0*224 
B7 


218 


0*610 


4 


Me^ § 
FRQ 


-0.403 
SI 


-0.963 

30 


-1.036 
23 


-0*289 
115 


-0.943 
2 


221 


0*747 


S 


Mean $ 


-1.126 
14 


-1*573 
2 


-1.070 
10 


-1.091 
IB 


-0*334 
177 


221 


1.239 


6 ! 


Mean e 
FRQ 


-0*239 
125 


-0*946 


-0.607 
j2 


-0*391 
32 


-0*978 
25 


220 


0.739 


7 


Me^ 9 
FRQ 


-2.0S9 
1 


-0*269 
153 


-1*365 
24 


-0.671 
30 


-0.946 
13 


221 


1.320 


3 


Mean 3 
FRQ 


-0*761 
37 


-1*205 
12 


-0.589 

a 
0 


-0.37C 
156 


-0.362 
10 


221 


0.329 


9 


Mean e 
FRQ 


-1.259 
10 


-0.746 
9 


-1*098 
21 


-0.312 
172 


-1.428 
B 


220 


1.116 


10 


Mean 1 
FRQ 


-0.194 
141 


-1.057 


-0.S50 
13 


-1.096 

47 


-0.64B 

4 


221 


0.902 


11 


Mean § 
FRQ 


-1.035 

22 


-0.253 
143 


-0.801 
26 


-1.059 
7 


-0*924 
22 


220 


0.806 


12 


Mean S 


-0.6B1 
23 


-0*SB3 
23 


-1*551 
10 


-1.113 
16 


-0*251 
147 


221 


1.300 


13 


Mean 9 
FRQ 


-0*597 
50 


-1*016 
6 


-0*777 
21 


-1.277 
13 


-0.302 
131 


221 


0.975 


14 


Hean § 
FRQ 


-o.aa7 

134 


-0.860 
13 


-1.523 
9 


-0.646 

34 


-1.023 
30 


220 


1.296 


15 


Mean 3 
FRQ 


-0.766 
3A 


-1*045 

la 


-0.845 

26 


-0*974 
36 


-0.061 
107 


221 


0.934 


16 


Mean € 
FRQ 


-0.764 

36 


-0.093 

a? 


-0*571 
54 


-1.369 
11 


-0.734 
29 


217 


1.276 


17 


; Meaa S 
5 FEQ 


-0.704 
52 


-0.373 
21 


-0.358 
5 


-0.123 
85 


-0.842 
58 


221 


0.730 


1^ 


: Mean ^ 
FRQ 


-0.139 
109 


-0.745 
33 


-0*731 
31 


-0.929 
39 


-0.291 

7 


219 


0*740 


19 


Mean § 
FRQ 


-1.012 

5 


-0.S03 
3S 


-0.875 
68 


-1*139 

7 


0.148 
33 


221 


1.287 


20 

1 


Mean § 
FRQ 


-0.923 

46 


-0.805 
38 


-0*948 
46 


0.304 
67 


-0.507 
24 


j 221 


1.252 
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TABLE 6-1 CCoatlnued) : Teat J2, Junior High School 3 



1 






















Indices 


1 


Alttcnatlvtt 

2 3 


4 


5 


Total 


Blacr«pancy 


I 




8 


6-161 
436 


-H0.s3a 

30 


-0,737 
25 


-1-099 
19 


-0.374 
63 


573 


1.260 


2 








































3 


Mean 

FRq 


§ 


-0.312 
54 


*0.287 
93 


-0.373 
97 


-0.436 
63 


0-351 
260 


567 


0.837 


4 


FRQ 




-0.02S 
83 


*0>a48 
77 


-0.252 
38 


0.181 
362 


-0.709 
12 


572 


1.029 


S 


FRQ 




-0.763 
30 


*0-766 
7 


-0.364 
19 


-0.611 

43 


0.107 
475 


574 


0.971 


. 6 






0>221 
371 


-0-722 
7 


-0.267 
96 


-0.675 
49 


-0.801 
45 


568 


1.022 


7 


FRQ 




-0.S97 
10 


0.17S 
441 


-1.125 
45 


-0.339 
50 


-0,870 
24 


570 


1.300 


8 


Mean 
FRq 


§ 


-0.43a 
5S 


-0.966 
31 


-0>448 
14 


0.100 
457 


-0.272 
14 


371 


1.066 


9 


Kaan 

FEq 


a 


-1.089 
32 


-0.368 
47 


-0.828 
67 


0.252 
407 


-0.780 
17 


370 


1.341 


10 


Hun 
FRQ 


8 


0.117 
473 


-1.019 
15 


-0.229 
28 


-1.035 
31 


0.022 
4 


571 


1.132 


U 


Mean 
FRQ 


§ 


-o.sss 

2$ 


0.264 
389 


-0.750 
69 


-0.666 
35 


-0.619 
43 


572 


1-014 


12 


Hun 
FRQ 


s 


M).478 
33 


-0.311 
87 


-1-394 
10 


-0-754 
33 


0-203 
407 


372 


1.397 


13 




6 


M3.S9b 
107 


-0.888 
16 


-0.366 
29 


-1.342 
14 


0.216 
407 


573 


1-558 


14 


FEQ 


6 


0.241 
387 


-0.367 
22 


-1.527 

n 


-0.382 
84 


*0.S24 
66 


571 


1.768 


13 


Mean 

FRq 


§ 


M).610 
67 


-0>8S3 
27 


-0.582 
79 


-0.638 
69 


0.441 
319 


561 


1-294 


16 


FRQ 


i 


-0.629 

sa 


0.264 
364 


-0.499 
75 


-0.344 
14 


-0.555 
58 


569 


0.893 


17 


Mean 

FRQ 




-0.277 
109 


0.166 
42 


-0.469 
30 


0.351 
259 


-0.346 
132 


372 


0.897 


18 


Hean 

FRq 




0.383 
294 


-0.380 
65 


-0.418 
80 


-0.548 
115 


-0.579 
11 


363 


0.962 


19 


Haan 
FRQ 




-0.943 
IS 


-0.SS2 
87 


-0.651 
136 


-0.789 
9 


0.439 
325 


372 


1.382 


20 


FRQ 


3 


-0.524 
73 


-0. 484 
74 


-0.770 
93 


0.692 

235 


-0. 363 
94 


574 


1.462 
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VIX Use of Index k* When Distractors Are in Full Work 

It is obvious in Table 6-1 of the preceding chapter that for 

these vocabulary items the knowledge or random guessing principle does 

not work behind the examinee's behavior, for the mean values of 9 

for the wrong answers are substantially different from c>ne another 

for most of the items. In cases like this, index k* » which was 

introduced in Chapter 5 as a modification of Sato's number of 

hypothetical J equivalent alternatives and used as an index for 

invalidating three-parameter models, can be used as a measure of 

desirability of the item for the group of examinees in question, 

just as Sato's Index is meant to be used for. An additional merit 

of index k* when it is used for this purpose will be that it can 

be used directly, without depending upon the relationship with the 

probability for the correct answer, p , which is illustrated 

K 

by Figure 2-1. 

Table 7-1 presents the estimated entropy H* obtained 
by (5.6), for each of the nineteen items and each of the four 
groups of examinees, JHl, JH2(a), JH2Cb) and JH3. The values 
of index k* , which correspond to these H*'s in Table 7-1* 
were obtained by (5.10) and are shown in Table 7-2. 

We can see in these tables that thirteen out of the total 
of nineteen items have higher values of fl* , and hence of £* , 
for JH2(a) than for JH2Cb). Since the subjects in these two 
groups are of the same school year, i.e., the second year of junior 
high school, this tendency may be related with the fact that for JH2(a) 
these nineteen items were given at the end of the test and for 



TABLE 7^1 

Entropy of Each of the Nineteen Vocabulary Items Based on 
Each of the Four Subgroups, i.e., J\mior High School, 
Grades 1, 2, 2 and 3. For the First Two Subgroups 
of Subjects Test JI Was Used and for the Other Two 
Subgroups Test J2 \jas Used. 



Subgroup 
^''Ttem^ 


JHl 


JH2(a) 


JH2(b) 


JH3 


37 (1) 


1.55907 


1.57572 


1.51218 


1.52080 


39 (3) 


1.57359 


1.57997 


1.55547 


1.58566 


40 (4) 


1.41987 


1.48141 


1.39913 


1.46917 


41 (5) 


1.47880 


1.52098 


1.46885 


1.48496 


42 (6) 


1.50740 


1.51576 


1.50880 


1.42679 


43 (7) 


1.54070 


1.51224 


1.39256 


1.49871 


44 (8) 


1.43049 


1.51333 


1.41791 


1.47934 


45 (9) 


1.42195 


1.49895 


1.54177 


1.52485 


46(10) 


1.37234 


1.36152 


1.36544 


1.39912 


47(11) 


1.52673 


1.58391 


1.54137 


1.57599 


48(12) 


1.59254 


1.57072 


1.57317 


1.43130 


49(13) 


1.51299 


1.40124 


1.40700 


1.32933 


50(14) 


1.54630 


1.46214 


1.50665 


1.43095 


51(15) 


1.59962 


1.59600 


1.58320 


1.55950 


52(16) 


1.54651 


1.54903 


1.51294 


1.51407 


53(17) 


1.45244 


1.46629 


1.41821 


1.48312 


54(18) 


1.51192 


1.45933 


1.51052 


1.46054 


55(19) 


1.23002 


1.27989 


1.25075 


1.30371 


56(20) 


1.60838 


1.60223 


1.58595 


1.60504 
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TABLE 7-2 

Number of Hypothetical, Equivalent Alternatives of Each 
^ the Nineteen Vocabulary Items Based on Each of the 
Four Subgroups, i.e.. Junior High School, Grades 1, 
2j 2 and 3. For the First Two Subgroups of Subjects 
Test Jl Was Used and for the Other Two Subgroups 
Test J2 W^as Used. 



^.Subgroup 


JHl 


JH2 (a) 


JH2 (fa) 


JH3 


37 (1) 


4.75440 


4.83420 


4.53660 


4.57590 


39 (3) 


4.82391 


4.85480 


4.73730 


4.88252 


40 (4) 


4.13659 


4.39917 


4.05166 


4. 34565. 


41 (5) 


4.38768 


4.57672 


4.34425 


4.41479 


42 (6) 


4.51496 


4.55290 


4.52130 


4.16531 


43 (7) 


4.66784 


4.53688 


4.02513 


4.47592 


44 (8) 


4.18076 


4.54183 


4.12850 


4.39004 


45 (9) 


4.14519 


4.47701 


4.67284 


4.59447 


46(10) 


3.94459 


3.90212 


3.91744 


4.05162 


47(11) 


4.60310 


4.87397 


4.67098 


4.83551 


48(12) 


4.91623 


4 . 81011 


4.82191 


4.18412 


49(13) 


4.54029 


4.0602 3 


4.08370 


3.77850 


50(14) 


4.69408 


4.31519 


4.51161 


4.18267 


51(15) 


4.95113 


4.S3326 


4.87053 


4.75646 


52(16) 


4.69506 


4.70690 


4.54008 


4 . 54 521 


53(17) 


4,27352 


4.33314 


4.12972 


4.40669 


54 (13) 


4,53542 


4. 30307 


4.52908 


4.30829 


55(19) 


3,42128 


3.59625 


3,49295 


3.68295 


56 (20) 


4.99472 


4,96410 


4,88392 


4.97805 
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JH2Cb) they were given at the beginning of the test. We can also 
observe that, for some items, there exists a mild tendency that 
the value of becomes greater as the school year increases, and, 

for some others, this tendency is reversed. Items 39(3), 40(4), 
44(?^), 45(9), 47(11), 53(17) and 55(19) belong to the first category, 
and items 37<l), 48(12), 49(13), 50(14), 51(15), 52(16) and 54(18) 
are members of the second category. In spite of these mild 
tendencies, however, the values of index k* are large, ranging, 
approximately, from 3.42 to 4.99 , for all the examinee groups, 
the result which indicates a high desirability of this subset of 
test items for these groups of examinees. 

We can observe a tendency that, regardless of the groups of 
examinees, some items have higher values of £* than others, and 
some other items have lower values of than others. Items 

56(20), 51(15) and 39(3) exemplify the first category, and items 
55(19) and 46(10) are members of the second category. 

The mean and the standard deviation of the nineteen values 
of for each of the four examinee groups were computed, and are 

presented in Table 7-3. We can see that all the mean values are 
between 4.39 and 4.51, and all the standard deviations are between 
0.34 and 0.40, i.e., very close to one another, respectively. 

As an additional information, the product -moment correlation 
coefficient of £*'s , which are shown in Table 7-2, was computed 
for each pair of examinee groups, and the result is presented in 
Table 7-4. We can see that these values are fairly large and 
positive, as we can expect from Table 7-2. 

Go 
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TABLE 7-3 

Mean and Standard Deviation (s.dJ of the 
Index k* for the Nineteen Vocabulary 
Items, for Each of the Four Examinee 
Groups* 



Examinee 
Group 


Mean 


s. d. 


JHl 


4.4832 


0, 3944 


JH2Ca) 


4,5038 


0.3659 


JH2(b) 


4,3931 


0.3759 


JH3 


4,3976 


0.3465 
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TABLE 7-4 

Product-Moment Correlation Coefficient of the Index 
for Each Fair of the Four Examinee Croups. 





JHl 


JH2(a) 


JH2 Cb) 


JH3 


JHl 


1.00000 


0.82705 


0.82711 


0.60447 • 


JH2(a) 


0.82705 


1.00000 


0.85120 


0.85770 


JH2(b) 


0.82711 


0.85120 


1.00000 


0.71444 


JH3 


0.60447 


0.85770 


0.71444 


1.00000 
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The result of the principal factor analysis of the correlation 
matrixj Table 7-4, with the largest correlation coefficient of each 
row or coluim as the first estimate of the communality and using 
three iterative reestimations of the communalities , provides us with 
the eigenvalues, 3.237, 0.266, 0.044 and -0.011 . Since the 
correlation matriXj with communalities as the principal diagonal 
el'^ments, is positive semi-^def inite^ the negative eigenvalue is 
due to the error^ resulting, mainlyj from the inaccuracy of Che 
estimation of the communalities. The final communality estimates 
are approximately 0.863, 0.999, 0.862 and 0.833, respectively. 
We can say from this result that a strong, dominating general 
factor exists behind the four sets of k**s , since the first 
eigenvalue, 3.237, is by far the largest, and the other eigenvalues 
are close to ze^o. The first factor loadings for the four examinee 
groups, which are the correlation coefficients between this general 
factor and the separate sets of £*'s , respectively, turned out 
to be 0.868, 0.983, 0.905 and 0.836 . 

These facts indicate that the four examinee groups are fairly 
similar to one another with respect to the configuration of the 
values of as -far as these nineteen vocabulary test items are 

concerned. 
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VIIX Proposal o£ a New Pamily of Models for the Multiple-Choice 
Item 

Throughout the history o£ mental measurement, the multiple- 
crtoice Item has been treated as a ^*poor image of the free-response 
item," and very little accomplishment has been made in pursuing 
its theoretical advantage, rather than its handicap. Most 
researchers in these days mechanically adopt the three-parameter 
logistic model for their research which is based on the unaltiple- 
choice item, without even trying to validate the model. As long as 
they continue doing this, we shall never be able to expect any 
progress in this area of science, in spite of the fact that more 
and more research materials and published papers are accumulated 
year by year. 

It has been one of the author's purposes of pursuing the 
method of estimating the operating characteristics without assuming 
any mathematical model a priori (Samejima, 1977bj 1977'^, 1978aj 
1978b, 1978c, 1978d, 1978e, 1978f) to Approach the operating 
characteristics of distractors, which are completely neglected 
by the users of three- parameter models. While this approach is 
undoubtedly more scientific than any others, it will he desirable 
to consider new types of models, which reflect psychological 
reality behind the examinee's behavior in the multiple-choice 
situation far better than three-parameter !uodels and the knowledge 
Or random guessing principle. 

The research on the vocabulary measurement made by Shiba 
and others should be credited for the fact that they did not 
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accept the fashionable three-parameter logistic model blindly as 
many other researchers do, and, moreover, they try to make full 
use of the information given by the distractors to the extent that 
they used it for branching examinees in tailored testing. As far 
as we treat the multiple-choice item as a binary item, it will be 
a poor substitute for the free-response item, which is contaminated 
by noise or guessing. If we make use of the information given 
by the distractors, however, the multiple-choice item can be more 
informative than the free-response item, and will no longer be a 
poor image of the free-response item. 

The family of models that will be proposed in this chapter 
is related with the graded response model (Samejima, 1969, 1972), 
in which an item is scored into more than two response categories. 
Let X be the graded item score, which assumes integers, 0 
through m , and P (9> be its operating characteristic. The 
graded response level can be classified into the homogeneous and 
the heterogeneous cases (Samejima, 1972), and we can name the 
normal ogive model (Samejima, 1972, 1973) and the logistic model 
(Samejima, 1972) as models in the homogeneous case, and Bock's 
multi-aomial response model (Bock, 1972, Samejima, 1972) as an 
example in the heterogeneous case. In these models, the operating 
characteristic of the item response category is defined, respectively, 
as follows. 



ca.i) 



(9) « (2iT) 



-1/2 rg 




.V2 



du 
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(8.2) (e) » [L+exp{^Da )}]^^- [L+exp{-Da (e-b 

m , 

(8.3) F (9) » exp{ot 9+6 }[ exp{a 6+6 }]' . 

g g S s=o 

In both the normal ogive and the logistic models, i,e* , in (8.1) and 
(8.2), the item parameter a is a positive number, and the item 
response parameter b satisfies the relationship such that 

(8.4) -« = b-,<b<b^<...<b <b »« , 

0 12 m m +1 

g g 

In the latter, D is a positive number which assumes 1.7 when the 

logistic model is used as a substitute for the normal ogive model* 

In Bock's multi-nomial model, one of the item response parameters, 

0.^ satisfies the inequality, 
g 



(8.5) % ^ \ ^ ^2 ^ ^ % ' 

g 

Suppose that the multiple-choice item g is constructed in 

such a way that all the main, plausible answers are covered by the 

alternatives, in addition to the correct answer. Suppose, further, 

that no guessing is involved in the examinee's behavior in answering 

item g * Then the examinee will either be attracted to one of 

the alternatives, or will have no idea at all to its answei* 

Arrange all the distractors in the order of their ^plausibility, 

and give the numbers 1 through (m -l) in the ascending order* 

S 

The number assigned to the correct answer is m , or m for 

S 

simplicity, and the one assigned to the "no idea at all" category 

ERIC 
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is 0 , In such a situation^ the operating characteristic of the 
graded response category can be used as the operating characteristic 
of the alternative, treating "no answer" as the additional alternative, 
to which the item score is 0 . 

In practice^ however, because of the pressure of testing, 
it is rather unlikely that the examinee will leave the item unanswered 
even when he has "no idea at all.*' For this reason, now we shall 
assume that the examinee guesses randocily when he is not attracted 
by the plausibility of any alternative. Thus we shall deal with 
the m alternatives as the graded response categories^ 1 through 

ra t and we can write for the operating characteristic of the 
alternative 

(3.6) (6) = (9) + (1/m )U-ZS4'^C9)1 , x -l,2...,mg , 

g g ^ s=l 

where (9) is the operating characteristic of the alternative 
g 

which is numbered x^ , when no guessing is involved. Thus we 
can use one of the F (e)'s defined by (3.1), (3.2) and (8.3), 
or a similar operating characteristic of the graded response 
category with a sound rationale behind it, depending upon the 
nature of the item and the set of alternatives. 

For the purpose of Illustration^ we shall use the normal ogive 

model for 4^ (6) , with a = 1.5 and b 's are -2.0, -1.0, 

g g 
0.0, 1.0 and 2.0 for x^ = 1^2^3,4,5 , respectively* Figure 8-1 

presents the operating characteristics of the (m^+1) alternatives i 

obtained by (8.1), when no guessing Is involved and **no answer'* 

13 treated as the additional alternative, or category 0 . 




'2.0 'to 0.0 1.0 2.0 
LATENT TRAIT 9 

FIGURE 8-i 



3.0 4.0 



SL^i":?r.7rcrrrf ti:o':i '~ T^rr r^-^^^ 

^ 1 2 ^-W . = 0.0 . = 1.0 and = 2.0 
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In this example, the operating characteristics of the four distractors 
are imimodal , with -1.5^ -0.5^ 0.5 and 1.5 as the modal points^ 
respectively. Figure 8-2 presents the operating characteristics 
of the five alternatives when guessing is involved^ which are 
given by (8.6) with ^ (9) replaced by P (9) given in (8.1). 
We can see that, unlike the operating characteristics when no guessing 
is involved^ these curves have the common asymptote, 1/5 ^ when 
9 approaches negative infinity. To compare the two operating 
characteristics of each alternative more clearly^ Figure 8-3 presents 
the two curves for each alternative in one graph, with the dotted 
line for the one without guessing, and the solid line for the one 
with guessing. 

The family of models presented by (8.6) seems reasonable^ 
in the sense that it considers both the iafonoation given by the 
distractors and the noise caused by random guessing. Its behavior 
will be investigated further^ and will be discussed in a separate 
paper. 

It is interesting to note that the use of the normal ogive 
model and its logistic approximation in the research on vocabulary 
measurement conducted by Shiba and others can be justified by 
the new family of models. As we can see in Che fifth graph of 
Figure 8-3, when the parameter b^^ is as distant from b^ as 
in this example^ the operating characteristic of the correct answer 
is practically the same as the item characteristic function of 
the normal ogive model on the dichotomous response level, except 
for the additional ''tail" on the lower levels of ability. If 
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FIGURE 8-2 

Operating Characteristics of Five Alternatives Following the Normal Ogive Model with h 
Guessing Effect. The Parameters Are: a - 1,5 . b, = -2.0 . b„ = -1,0 . = 0,0 . 1 

= 1*0 and = 2*0 . < 

ri 



o 
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FIGURE 8^3 

Comparison of the Two Operating Characteristics In the Normal Ogive Model, 

X = 1 . 
6 



1.0 




FIGURE 8-3 (Continued) x = 2 




FIGURE 8-3 (Continued) x = 4 
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FIGURE 8-3 (Continued) x = 5 
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this is the case with all the items in the test and the ability 
distribution of our examinees does not include lower levels of 
9 vhere these tails lie^ we can approximate the operating 
characteristic of the correct ansver by the normal ogive model 
on the dichotomous response levels and use the tetrachoric correlation 
coefficient and the logistic approximation and so on^ just as Shiba 
and others did. 
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XX Discussion and Conclusions 

We have introduced Sato's number of hypothetical, equivalent 
alternatives, and defined its modification, index k* , as a measure 
of invalidating the three-parameter logistic, or normal ogive, model. 
We have also introduced Shiba's research on the measurement of 
vocabulary and the construction of a tailored test, using the 
information given by distractors. Various observations and 
discussion have been made concerning the three-parameter models 
and item distractors, the validation of mathematical models, and 
so forth. Finally, a new family of models for the multiple- 
choice item, which formulate both the operating characteristics of 
distractors and the effect of random guessing, has been proposed. 

There is a tendency that researchers restrict their ideas 
within the tradition of their own culture. Thus they tend to 
accept whatever is familiar to them, what is fashionable among 
other researchers in their culture, and so on, without feeling the 
necessity of validating the ideas and mathematical models in 
relation with their specific data and psychological reality. 
The virtue of doubt can be obtained if they shift their attention 
to what is going on outside of their own culture and climate, 
and try to think what is really right. 

Three-parameter models for the multiple-choice item have 
been too readily accepted among psychometricians and applied 
psychologists, and they have been using the models without trying 
to validate them. Unless we correct this wrong orientation, 
^ psychology will never make any progress, regardless of the fact 
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that more data are accumulated and more papers are published year 
by year. In the author's opinion, psychology has not yet establish<id 
itself as a science, and we need to do that by putting ourselves in 
a right track of research- In so doing, the validation of 
mathematical models is certainly one of the most important things, 
The departure from the tradition should also be made in the 
treatment of the multiple-choice item. Instead of trying to handle 
the multiple-choice item as a ''blurred'' substitute for the free- 
response item, we must make full use of its advantage, which the 
free-response item does not have. The operating characteristics 
of the distractors of the multiple-choice item will add more 
information about the examinee's ability level, We must set 
a criterion for the quality of multiple-choice items from this 
aspect also. 
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TABLE A^L 

Alternatives Selected by Five Hundred Hypothetical Examinees Following the Three Parameter Nornial 

Ogive Model, for Each of the Five Hypothetical Items. 
(1=A, 2^B, 3=C, 4^D and 5=E.) 





4 1 


I G 0 I 




— 

3 f 3 ? 4 








S 






1 






.« 










_1 Z 

Li 






14 

;1 ' 
lA 








^1 6 

19 






■ 22 





















: 19 












31 










3b 




~f~l" 


>J 




3S 

" 39 
40 


-n't ^1 


I 5 "a 


41 






41 






43 






. 44 
"^45 




4r 
4a 

49 





Subject 


^1 c*ni«tiv4 
^ > C 0 V 


St 

sz 


' 1 J f T 

3 .4 i 2 _ 


54 

5S 

H — 5J> 


I — c - ^ 

, — c ^ ~ ? 


_._ is 






■ * J ^ J— 




3 3 112" 


— ,42- 




"63 

^ '7 


4^'" 1 " ■ 1 ' 1 4 

■ ^ — ' — -5 s — ^ 1— 


. . Aft 

,_.6T 

.6ft 




*9 
70 
? I 






_ 5~ 2 4 2 2 


—JJt. 




' T6 






* -2 I I 4^ 


T* 












^*4._ 

— as—. 




I * ^ t 4 

12 4 11 


. ar . 
aa 




10 _ 




, 9i_. 




*3 

... 






.2 > I k,. 

~2 t. 3 ^^^5 


* 9* 
100 


2 4 1 K I 3 
4.4 Z ' S 1 



Subject 


ik t C 


D e 


101 




loi 






_103 __ 

^ Q4 . , 






1 0 s 
> 106 
10 1 






10ft 

1?? 
' 110 

1 i L 






III 
113 

1 1 4 






1 ' s 

116 










iia 
^rii9 






^IZl 




i zz _ 




13 J 




*'II4 






_ 

_iia ._ 


_4 4 










; 130 

■t 131 
_J3I. 


a:' I s 13 


133 




134 




_i3S 




. 136 

i ur 

^9 




—139 _ 






140 






—141 


12 1 




1*2 

^ l43 
—144 




14S 
_146 ^ 






147 




■ 14ft 

* J49 
' ISO 




1 ■ 3 



Subject 




^ItBTUtlV* 

D C I» C 








ISI 






ISZ 2 1 4 4 3 


155 1114 2 


V 1S4 
ISS 






l_is* 






i^r 






2 5 S 4 1 


1*9 I 4 * I 4 


160 
Ul 

' 16: 






_.1<»3 

_.U4 






_I*S 






U6 
16T 
' lift 




J 1 — ■ ^ 1 - - 


lA9 






ir^ 
Ml 
1T2 






ir3 






lU 












_ 1T« 
.. 171 






s ir9 
X ifto 
ZJai 






_ lai . 
laa 






1^4 5 2 1 2 4 


^las^ 




^ ^ii -fl^'-a w-i 


10« 
—IftT 


4 ' 4 




i»a 

_ ltl9 






190 






191 






l^I 






19S 






115 4 4 


191 

19a 






^199 






. ZOO 







ERIC 



TABLE A-L (Continued) 





A 1 C 0 E 


20 i 






i ;' i > ,i : *» 


20 D 




zo* 

_2U 






1." \ * ~ 


IIS 
—211 




„2ia 
_ii* _ 




_2I0 _ 
' ZZl 

III 






II* _ 

_II* _ 
_II* 




II r 

214 

^ai9 ._ 


lA 1113 


I JO 
IM 
" 211 
111 
11* 
135 
,11* 

i3r 
134 
- iJ* 
1^0 

f 2^1 v 




. 1*3 . 

, I^v 

_Z4» , 




' 1^1 ^ 
. 2*r , 

''. 1^9 . 


-J- I " * "- i ' ' 1 


2^* . 
153 







A 1 C It C 






251 

li? 
^is* _ 


2 14 ~ 1 


155 

,25*__ 

_2:ti_ 

I5d 
I5^» 


i ; — » - i 


^2*t , 
.2*1 _ 
^1*3 




1*4 
■ I*i 
_2ftfe _ 




. I*' 




iro 
2n 

_ITI 




^iT3 ._ 
_IH ^ 

in 




IT4 
ITT 

_2Tt> 




179 
^ I<( J 
. IS I 


_1 i 


2dl 

lai 




la^ 

Id* 




laa 

144 
2*1 

1*1 
1*1 




1*5 

i^* 

.2* ? _ - 




ivfl 
1*4 







A » C 0 




301 

301 , 
303., . 
















30T , 
30j9 

L^* 










_JiI 




3t 3 
* 3t* 




_jt* _ 

_3l T 

31d 






SI 9 
* 310 
321.. 






^321 .. 

__323 

-^Z'x 


1 L .1 .1 




\ 315 
-J 31 
l3ll 




_3Id _ 






330. 






331 
332 
^33 1 




^33* 






_J35 — 
_^3*„ 






. 33T 

' 33a 

_3^* 




,,3*U _. 
_3*l _ 











i-3^3 
* 3** 








_3*I. 




1 _ 


_3^b _ 






350 







3*1 
1«1 

3«* 
355 

356 
35T 

^35d. 

. 35* 
3*i> 

-361 , 
36l 
3&3 

. 36^ 
_ 3^fr 

3^3 
*f36*^ 
1370_ 
_3ll 

3ri 
.AU . 

374 

375 

3Tr 

_3T» 
^1TS_ 

3aQ 
3dt 
_}fll^ 

3d3 
^33^ 

3 a* 

3t>T 
_}33_ 

__3d* 

34l 
3*3 
^*4_ 
_395 

_m 

34a 
3** 



llCf mat tv« 
> C P 



TAIiLE A-L (Continued) 



Sub Jut 


A > C p 1 


















I 1 i Vi 


40T 










4lO 
4U 

^il— 




4U 

415 

. .4 1 & . 




417 
j 4Li» 




1.411 






4Zi 
' 414 
-ftZ5 






L4iT 










r4Z9 

>410 7 

Liii 




^1 H^i 






413 

^34. 






%35 
4]« 








^4>» 




^4Q 




441 
441 








-44J 












44 7 

44 a 

_*49 




I 1 


450 . 







Subject 


A > C D C 


.451 


^ I I 1 1 1 _ 


451 




4»i - 

454 \ 
J5S 




^4 5* 




457 


1 L. 1 J 


-ih5a 




459 
' 4bO 


! i i 








4«4 




465 
4b& 


^1 1 1 1 r\~ 

I ' ' I " t - 1 C 1 


..4bd 




_4«9 

-Aro 


— i IZi ~i — } 1^ 


471 
4 71 




vr4 

4j5 
4U 


i 1 1 1 L 

1 1 1 


47d 
479 
400 




_>ai 


_.. 1 _.l 1 _1 I 

._ .l ^. I 1 1 


4t4 


^ { vwjrl -v{ 






_ 




ih^i 




_49i _ 
_*»4 ._ 




49t 
.497 




._>^9 ^ 
^ 500 _ 
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TABLE ^-2 



Frequency Ratio, Pj , of Each of the Five Alternatives and the 

Estimated Probability, P* , for the Correct Answer with Which 

R 

the Examinee Selects the Correct Answer by Random Guessing at 
the Maximumj for Each of the Nineteen Vocabulary Items. 
Junior High School, Grade 1, for Test Jl. 



Alcvrnaclvfl 

2 3 



37 (I) 

39 <3) 

40 <4) 
^1 <5) 
^2 (&) 
^3 <7) 

44 (ft) 

45 <9) 
45(10) 
47<11) 
4S(1Z) 
49(i3> 
50 CU) 
51(15) 
5ZCl6) 
53(17) 
54(18) 
55(19) 
56(;0) 



RELATIVE I^REJOEnCV 



G. 50175 0*0a7«l 0*tO315 
0.13271 



a.l(l3l5 0. 20^^55 



0.16192 0.20^^3 0.20996 



0.090T5 0,332T« 
0.17^50 



AELMtVE FREQVJEriCV 



O.lO^Tl O.a«60T 0.15T0T 



a.*l6** Q.015TI 
_p^- 1 i» 692 



RELATIVE FR^uObNCV 



0.092 50 0.03^90 O.O^OU 



0. inbl^ 0*63^12 
0.0932» 



RELATIVE FREJUEN^V 
rtOUlFlED KtL.FHcM. 



0.^3333 0.036€« 0.212 2d 
0.16122 



0.1^737 0.1-Tola 



a£LATlV£ FA£wUENCv 
MLDIF lEO '^cL^FfvEQ. 



0.0«5A5 0.538^6 0. 1T133 
0.12583 



0.11T13 0.12T62 



RTTa E FHtwOi;r*Ct 



RELaT IVc Fr^tOUt^CV 
HOulFlEO RtL.FR£Q. 



mO^jIF lED REL.^ft£0. 



0^209 1« 0. II 7T5~~ 0.02^60 
'0T089 7 9 OTo * * 0l~ 0. 21 65 5~ 



0.5852^ 0.06327 
0,130^0 



0.6091^ 0.0^0^9 
0.12*27' 



0 ^"52 1 1'i 0.OU099 0.077^ 
0. 16263 



0.28873 0.03169 



flCtDl^lEQ »<tL.^REQ. 
iU01Fl£0 l^t:L.FKb;}. 

MUutF LEU RcL.^kEU. 

"ftSL ATI V r F ^ £ OO^ NtTY " 

Ol ArrrE~F^CEu>c ncy " 



O.I21T7 0^^9367" 0. £7768 
0.16628 

^ " l5Tr*?6X 0.1773 6 OV I S 2 5^" 

^72 0070*" O.05*J(i" 0.0733ii'^ 

^;5^^^^" 080 56 "^3^d6l30'^ 
0«12*68 



0.0931^ 07ll'^2* 



on Il"79 or*"5T ^ 
0.7385* 



0.12718 

"3715551 0.16413 



5^1?&93 0.1*6*3 0. 1S393 oViOiTr^ ^.^mT 

0.16225 



ft t U iTvTTKt tj U L N £ V 
^ COi T1 V ^ fJo e M CV" 



0.20531 0.*lS93 0."l*lS9" 
0*15607 

0.28*97" a,C69l6 "0,059** 



0,324*2 " 0, i9 786 0*1782 5" 
0. 19109 

"0,Q'^729" o; 12609^ 0,55517 
0*18182^ 0,17657 O."20IO5 



0. 0 6 OTa ff7 1 7699 



0,-22911 

0."25'JIi~&;&*^' 



O;d5079 "0.f2067" 
"'0.32187 

0.2-^6 ?5 (Sn "9 *"0 6' 
0. 18862 
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TABLE (Continued): Junior High School, Grade 2, 

for Test JL. 



hltecnative 

2 3^ 



37 (J) 

39 (3) 

40 (4) 
4r (5) 

42 (6) 

43 (7) 

44 {8) 

45 C9) 
46(10) 
47(11) 
4a(U) 
49(13) 
W(14) 
51(15) 
52 (16) 
53(17) 
SA(]«) 
55(19) 
56(20) 



0,59121 0, 06571 0,06571 0,08132 0 
0,10662 - 



.15604 \ 

3 



HOO I H t OhcL , F K 



0,12222 0,21556 0,16222 O^lllll^^O* 

0, 



36869 
16372" 



RELATIVE F><E^UEW:y 
HODtk-JED acL,FKEa, 



0,13319 0,20742 0,0^82 5 0,53057 0,0305 7 

0,13610 



RELATIVE ffVfcaueNC'jL 
>^OOIFUO KcL,rKE^, 



6,05a57_ 0,02620 0,0*121 0*10I95_ 0, 

0, 



77007 
06*29 



^ELA71VE FAtguENCV 
NOOIFIFO HcL,FftEQ* 



0,56236 0*03063 0,14680 0, 14661 0 
.0^12316 



.111^ 



AtLATlVE FkEi^UcNCV 
HLkO tF tEi> ftcL,FAEii, 



0,02183 0,66122 0, U5T2 0, lOG 
O,09Ol3 



,0607 9 



HOOIFIEO AcL,Fk£<1, 



~07l4l5^ 0711642" 0T0^a9 0,6jil6 0. 

0,10216 



06796 



HOJIFtED KcL,FREQ; 



"07^6536 57050ri 0TIT2n O^STW 0,06100 







0,10025 



MUOIFIED {vtLL,Ftt£^, 



^67r05 0705447 0,0631^ 5719^608 oToTsIF 
0,11336 



TTFTATTVmtBJOrF 
H001FlEi> f<LL,FREQ, 



0,093"6S '0', 65r'iS 0",11765"B7Z)"6536 0*06536_ 
0, 08629 - - 



)","n30jS 0*16661 6,bV3?2 &T<!fa5o~ 



AvJif ito il(.L,FKEkj, 
HOalFltU RcL,FKc^, 



0TiOa7o '<5",C3476 0*04130 0,07609 0" 

" 0 



>07 
.12921 

rt39"i"r' 

.11792 



(yTS^J"^!" o;C4l76 0*04^)"96 0,"lT45 
0,12331 



"0", 136^6 



"RTLTrrvrxiTCJuc w l r 

XOJtf lEij ftck,FKE^, 

-FTfcxfTyrrs £irut T^cr 

HOlJIF IEO AcL,fK£0, 



MODIFIED F^tU,FaE^, 



^C;i6^64 ^Cn0491 0,12277^715X79 0. 

0, 

■ C,l"3St3"''0;52229'"C*l5aiS""07d^¥77^ 
0,U6l7 

0^204 79" '0,07407 "0,Ci5'6'fl^ 0","39£l'? &" 

0*18236' 



13961 J 



"ftELATiVE F^lEOUEfCV 
tiOOlFlED REt.,F^£0, 

TcCAnrvrTA^ uu t mc^" 

NulJlflEL> KcU,i-Ki;i,. 
HOiy.rlED kcL,k'K£w, 



^39i?3 Qilf^630 "0,"l^ie^^07^^7 DiO'lT^ 
0,18393 ■ 



07&4367 ^6,09625" O', 37991 "O* O^SlO O'^ 

0. 



v^iJ65^ 
£1613" 



■OTr3Ta5 — e;ji9 7S 17562^ or ^FITi 5776915" 

0,18130 



TABLE ^-2 (Continued): Junior High School, Grade 2, 

for Test J2. 



l(37> 
aU4> 



P. aAd ?i 
_J 



»^ELaT IVE ^kcUUcNCt 



RELATIVE ffttQUtNCT 



^ELArlVE iFkEwgtNCT 



Relative fkE^uEt^cr 



jiUUIflEi> RcL^fivEQ, 



11U7) 

13<49) 
14C50) 
13C5I) 

17<53) 



Piuui^tEU ;;cL*F^Eg* 



flOgtFlEO RhLffilEQo 



AEcm^^eT7"c^0E^CT^ 

^^rL^^T^V^TP£^^E HCY" 
I L A T T V E~Frt E wij E N C t" 



3 * 



0,65411 0,0^977 



0,O3S97 0,0^^77 0,15937' 



0,123^^ 0*Z06^Z 0, 19266 



0, J7339 0,39903 
0,16079 



0,23077 0,1357S 
« 



0,10407 



0,52036 ^^, 00905 
0, 15717 



0,06335 0,00905 0,0^525 



0,0^145 0,40090 
" " 0,05953 



0,56816 0,02727 
0,Ui63 ^ 



a,U5^5 0,145^5 0,1136^ 



0,00<v5i 0,692:1 
0, 10171 



"0716 r*l ^07^5^30" 
~570*5'V5" ~oVo*oVl' 
"TfTb 3 d 0 r ^ 0 715 ^ 9 77^ 



"^TloO^O 0\ 65000 
0,095 3* 

T)T10*07 ' o; I0*l>7^ 
0,22624 0,02715 



^,60909 " 0,05909 
0,11132 

^D,153d5 a;06U5 



0,16590 0,40092 
0,16922 

"0;23529" O;09502 



0i^»^77a 0,15068 
0,1'^^33 

0,02262~0, 17195 
~0,£Q3U 0,17195 



0,1 ^a60 

^a,"as7T5~ 
'o; 093^5" 
"oToeRT" 
0 ,"iiTra"" 

''o;0452^" 
0*0^502 
0,0^091 " 
0,11765" 

0,02262" 
0,1^»155"' 
0,39319 " 
0,20314 



0, 13573 0,05632 



0,705aa 0,0^^525 
0,09^^01 



OTt 3 1 570 36 36 
0,059*0 

"tf, ^1 Z6 7 0*0161 0^ 



0,03182 0,1^/000 



0 • 0 3l * 5 0, 6651V 
0,08761 

O', 05 3e2^iT5'9Z"76^ 
0,13206 



0, loSVO 0,^3^^T^ 
0,13327 

3;05J69' oTl Jl^*" 



1J^"3 3 ^ oT ^yTTtT^"^ 
0,19663 ~_ 



O,031*>7 0,37^57 
0,2 5048' 

0, 3"v>317^o7ia36?>~ 
0, 1794t 



-SO- 



TABLE A^2 (Continued): Junior High School, Grade 3, 

for Test J2 . 



1<37) 
11(4 7) 

mm 

1*C50) 

16(»n 
17(53) 



RELATIVE FAEgOcKCy 



RELATIVE FF^EQUENCy 



RELATiVE FREOUcNCy 
HQ JIF IbJ RcL ,_F ft £ J *^ 

>i^i>lFi€0 f^cL.Fftl;Q, 



^iLAfTvFTRlJUcrici" 

MuaiFl€U KcU*Fll€Q* 



t^Slayiv^ FKejueNCV" 

HOi>IFl£U llcL,Fft£j/ ^ 



MOi>lf lEif ftbL*FAilU, 



MiiJiFlED j^l;L,Fr.aQ, 



H^JlFlbH Rct.,FHa^, 
Hujii- [bO KLL,FHdJ, 



Alceiuaciv* 

3 



76041 0,05236 0,04363 0, 03316 0,1099^ 

'^6 6 

0,0952^ 0,16^02 0,17108 0,lllir^ 3,^5855 
0,13^^6 



0,1^510 0,13462 0,066^^3 0*63267 0,0209a 

0*10973 



0,05226 0,01220 0,03310 0,07^91 0,027 53 
0,05051 



0,65317 0,01232 OfU^Ol 040a^>P7 0*07923 
0,10957 



0,3175"^ 0, 7T36a 0,07395 0,087 72 0,0*211 
0*06511 



0.O963Z 0,05*29 0,02*52 0,a003 5 0*02*52 

0405aa9 



0,0561* 0,082*6 0, 11 75* t» 7l*0* _ 0,6l9aF] 

0,07956_J 



c>,92a3Tf 6^0262lr 0,0*90* o,oa932 o,oo70i 

0,0562* 



0*06^^5 ^T^ISCOt' 0, 12063 0106119 0,07517 
0,08^*1 Z 



"0,0576^ 0, 15 215 0V017*8 0,06119 0,7115* 

0,09059 



"571867* 0,02792 "U, 05061 ~ 0,02*^3 0,71^30 

04l0<^27 



15767776'" 0:'03S5r'"0O^r02~^n^7n 57X1359" 
0, 10125 



0*11^*"^^ C,a*Sl3 0Vl*Ofl2 ""0;i22^9 tfZ 56^63 

0,11*33 



0, 101 93 0,63 9 72 0, 131a t ^)r02"*6<J ir7lOl93 
0,10163 



"tf;"l9056 0*073*3 07o5r^5 01^*52^0 0,21^77^ 

0, 16063" ' 

"0752015 — ir*U50*~e7T*15^r^ZDTr^r 11,019^ 
0,1*490 

"0,0262^ "^0,15^10 0,Z3 776 OlOlSTS 036£r&" 

0,16095 



o 



-81- 



APPEKBIX III 



9o 



ERIC 



RESEARCHERS OF THE UNIVERSITY OF TOYKO 



Dr. Sukeyori Shiba 
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Mr. Tomokazu Haebara 
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The University of Iowa 
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Dr. Takahiro Sato (Representative) 
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Nippon Electric Co., Ltd- 
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Tokyo Institute of Technology 
2-12*1 Ohokayama, Meguro-ku 
Tokyo 152 J Japan 
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Phone: (06) 877-5111 
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PUBLICATIONS BY MEMBERS OF THE EDUCATIONAL TECHNOLOGY GROUP OF THE 
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I Papers in English 

[11 Kurata, M. and T. Sato. 

Test construction system applying item statistics. 
Proceedings of the International Conference on 
Cybernetics and Society ^ 1978, 368-372, 

[2] Sato, T. 

Instructional data processing, approach to computer 
managed instruction- Ni:c Research & Development , 1973, 
29, 38-49, 

t3l Sato, T. and M, Kurata. 

Basic S-P score table characteristics. NEC Research 
& Development , 1977, 47, 64-71, 



Bo9j^_ to Japanese 

[4] Sato, T. S-P table analysis : analysis and interpretation ' 
of test scores . Tokyo: Meiji-Tosho Publishing Co-, 1975, 

Is] Sato, T. (Ed*)- CMI system (computer managed instruction 

system): Uses of computerg In &dy <; ai:io^. Denshi Tsushin 
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Qjtjigr Publications in Ja^^anese 
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17] Kurata, M* and T. Sato* Simulation and analysis of the item 

score table using an S-P score table model. Kode Keiryogaku, 
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[8] Nagaoka, K. and H. Fujita* Analysis for speaking time in 

discussion* Kodo Keiryog^aku (Japan Behaviometrics) , 1978, 
6, 1-8. 

E9) Sato, T* Engineering techniques the analysis of educational 

data y: application of entropy . NEC Central Laboratories, 197 

[lo] Sato, T. Construction of a test and the S-P tabl^ . NEC Central 
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[12] Sato, T. How to make and use S-P tables as a method of 

analyzing the test result. Shido to Hyoka (Guidance 
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[13] Sato, T. Item bank system : A set of test items and its 
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[14] Sato, T.j M. Kurata and H. Ikeda. Estimation of statistical 
characteristics of the educational tests. Denshi Tsushin 
Gakkai ClECE) , Educational TechnolORy , 1978, 2, 27-30. 
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structure of instructional units using the Interpretive 
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[16] Takeya, M. and T. Sato. On the learning progress distribution 
of programmed Instruction. Kodo Keiryogaku (Japan 
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[17] Takeya, M. A property analysis of an item score matrix 
in GMT systems- Trans . lECE , 1977, 50, 967-974. 

[18] Takeya> M* Hierarchical structure analysis among instructional 
objectives on student performance scores. Denshi 
Tsushin Gakkai (XECE), Educational Technology , 1978, 7, 
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[19] Takeya, M, Application of an item structure analysis to an 
S*P score table. Denshi Tsushin Gakkai (IECE) , 
Educational Technology , 1978, 12, 35-40- 

[20] Takeya, On an expanded item relational structure analysis 
and its application. Denshi Tsushin Gakkai (IECE) , 
Educational Technology , 1979, 1, 23-26. 

[21] Takeya, M. Comparison of the item relational structure analysis 
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of the Conference of Nippon Kodo Keiryo Gakkai (Japanese 
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